Improving Word Segmentation by Simultaneously Learning Phonotactics
نویسندگان
چکیده
The most accurate unsupervised word segmentation systems that are currently available (Brent, 1999; Venkataraman, 2001; Goldwater, 2007) use a simple unigram model of phonotactics. While this simplifies some of the calculations, it overlooks cues that infant language acquisition researchers have shown to be useful for segmentation (Mattys et al., 1999; Mattys and Jusczyk, 2001). Here we explore the utility of using bigram and trigram phonotactic models by enhancing Brent’s (1999) MBDP-1 algorithm. The results show the improved MBDP-Phon model outperforms other unsupervised word segmentation systems (e.g., Brent, 1999; Venkataraman, 2001; Goldwater, 2007).
منابع مشابه
Why segmentation matters: Experience-driven segmentation errors impair "morpheme" learning.
We ask whether an adult learner's knowledge of their native language impedes statistical learning in a new language beyond just word segmentation (as previously shown). In particular, we examine the impact of native-language word-form phonotactics on learners' ability to segment words into their component morphemes and learn phonologically triggered variation of morphemes. We find that learning...
متن کاملWhy Adult Language Learning is Harder: A Computational Model of the Consequences of Cultural Selection for Learnability
This paper reports on a limited model of language evolution that incorporates transmission noise and errorful learning as sources of variation. The model illustrates how the adaptation of language to the statistical learning mechanisms of infants may be a factor in the apparent ceiling on adult second language achievement. The model is limited in its focus to only phonotactics because the proba...
متن کاملComparing Models of Phonotactics for Word Segmentation
Developmental research indicates that infants use low-level statistical regularities, or phonotactics, to segment words from continuous speech. In this paper, we present a segmentation framework that enables the direct comparison of different phonotactic models for segmentation. We compare a model using phoneme transitional probabilities, which have been widely used in computational models, to ...
متن کاملRunning head: THE EFFECT OF SONORITY ON WORD SEGMENTATION The Effect of Sonority on Word Segmentation: Evidence for a Phonological Universal
It has been well documented that language specific cues—such as transitional probability (TP), stress and phonotactics—can be used for word segmentation. In our current work, we investigate what role a phonological universal, the sonority sequencing principle (SSP), may also play. Participants were presented with an unsegmented stream of speech from an artificial language with non-English onset...
متن کاملUse of Word Segmentation Cues in Adults: L1 Phonotactics versus L2 Transitional Probabilities
We investigate whether adult learners’ knowledge of phonotactic restrictions on word forms from their first language (L1) impact their word segmentation abilities in a new language. Adult learners were exposed to a speech stream in which language specific and non-language specific cues for word segmentation were pitted against one another. English rules about possible phonetic combinations (pho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008